Large-scale phylogenomics of aquatic bacteria reveal molecular mechanisms for adaptation to salinity

The crossing of environmental barriers poses major adaptive challenges. Rareness of freshwater-marine transitions separates the bacterial communities, but how these are related to brackish counterparts remains elusive, as do the molecular adaptations facilitating cross-biome transitions. We conducted large-scale phylogenomic analysis of freshwater, brackish, and marine quality-filtered metagenome-assembled genomes (11,248). Average nucleotide identity analyses showed that bacterial species rarely existed in multiple biomes. In contrast, distinct brackish basins cohosted numerous species, but their intraspecific population structures displayed clear signs of geographic separation. We further identified the most recent cross-biome transitions, which were rare, ancient, and most commonly directed toward the brackish biome. Transitions were accompanied by systematic changes in amino acid composition and isoelectric point distributions of inferred proteomes, which evolved over millions of years, as well as convergent gains or losses of specific gene functions. Therefore, adaptive challenges entailing proteome reorganization and specific changes in gene content constrains the cross-biome transitions, resulting in species-level separation between aquatic biomes.


Supplementary Data 1
A spreadsheet with detailed results of clustering and MSG identification. Contains a table annotating MAGs to >95% ANI clusters and the representatives chosen for further analysis marked as well as sheets with just the representatives, clusters common between the brackish basins and between the biomes. The 2 nd sheet (MSG_table) is a table with all the MAGs within identified monobiomic sister groups (MSGs), annotated to appropriate transition_ID, biome and transition type. Taxonomic classification and transition times and directions are also included in this table. The first sheet also contains accession numbers for the bacterial MAGs used in this study.

Supplementary Data 2
Constraint file used for estimating time since divergence, input for RelTime (MEGA11). Minimal estimates of time since host species diverged [ma], based on the fossil record, were used to set the constraints (see Supplementary Table S1).

Supplementary Data 3
A spreadsheet with detailed results of comparison of isoelectric point (pI) distributions and amino acid compositions of proteomes across pairs of MSGs (transitions). Statistics (p-values and differences sizes) for pairwise comparisons of inferred proteome properties and composition, i.e. i) relative frequencies of acidic, neutral, and basic (pI categories) proteins; ii) genome sizes as defined as number of inferred protein-coding genes; iii) amino acid relative frequencies; iv) relative frequencies of amino acids categories. Each set of statistics is followed by a table with changes across each of the identified transitions (MSG pairs) given separately, connected to transition ID, taxonomy, and transition type ("tr_diffs" in name of the sheet; for pIs, changes for the 3 protein categories as well as 0.5 pH are given, the latter named by the value in the middle of the range).

Supplementary Data 4
A spreadsheet with detailed results on identified significantly differentially present (gained/lost) genes across pairs of MSGs (transitions). Sheets 1-3: results of MSG-based (phylogeny-aware) gene content analysis. Tables with all the significant (FDR < 0.1, shaded in orange) differentially present genes across pairs of MSGs. For FB and FM type transitions additional genes were added to the table to show at least the top 25 most significant genes regardless of the FDR values. Sheets 4-6: Biome(s) in which the differentially present KOs were found across the identified transitions (MSG pairs), i.e. the data presented in Fig. 6 in text form and annotated to more specific taxa and single transition events. Includes taxonomic annotation of the transitions and numbers of bacterial species in MSGs from respective biomes. Sheets 7-9: Fraction of cases in which gene A (row) was also annotated as gene B (column), based on {transition type}.annotation.gz files. Sheets 10-12: Results of phylogenyunaware gene content analysis. Tables with all the significant (FDR < 0.1) differentially present genes from an unpaired comparison of all bacterial species from each biome.

Time since divergence [ma]
between transition and "no-transition" events would depend on how much the rate of change increases after transitions. Under a highly increased rate of change (orange line), only in the early stages (dark-gray background in b.), the differences are indistinguishable from the noise.
If the rate increase is weaker, it takes substantially longer for the differences to reach the same level (light-gray background in b.). Results shown in Fig. 5g-i suggest that model B is the most common dynamic in the bacterial world.

Supplementary Fig. S8 | Results of over-representation analysis of the functional categories (KEGG Orthology level C) of differentially present KOs (i.e. gained/lost genes).
The probability of obtaining the observed number of KOs annotated to a category among the differentially present KOs, as if they were drawn at random from all the KOs analyzed for a transition type, was assessed using a hypergeometric test.

Supplementary Discussion
Please note that information on "Role" and "Regulation" is mostly based on literature, while "Potential mechanism" is more speculative. In some cases, the genes' roles in adaptation or response to salinity have been previously described, and the citations are provided accordingly. Names of genes found in both FB and FM transitions are underlined, and the information for these are repeated since the direction of change was the same in all cases (from lower to higher salinity). Only the literature names from the cited papers are given and may refer to either the gene itself or its product (protein).
Please note that all the remarks on function and regulation come from the model organisms in which the genes were studied and may vary across the bacterial tree of life.
The roles in uninvestigated processes and regulatory processes are usually not mentioned. The lack of information is implied, though we encourage you to dive deeper into the literature when needed. Overall, do not treat these notes as conclusions on why these genes are differentially present across MSGs but rather as our hypotheses. We hope these notes can serve as a starting for anybody interested in further investigations into the potential role of these genes in adaptation to salinity and beyond.
Abbreviations: MAG -metagenome-assembled genome; MSG -monobiomic sister group; FB -freshwater ↔ brackish; BM -brackish ↔ marine; FM -freshwater ↔ marine Potential mechanism: Controlled production of polyamines can be a mechanism of plastic response to changes in salinity, characteristic to brackish waters as opposed to marine environment with relatively stable salt concentrations. In some cases, may be connected to post-translational modifications through K00809. Potentially promotes adaptive phenotypic plasticity allowing life in different osmolarity/salinity.
Use of urea as nitrogen source has been observed to be gained in specific groups of bacteria transitioning to more N-limited environments with lower salinity (54,56). Thus, the polyamine-producing pathway may also be used in opposite than conventional direction under N deficiency. Reactions and identified genes responsible for them: (1) putrescine + urea ⇌ agmatine + H2O (K01480) (2) agmatine + CO2 ⇌ L-arginine (K01585 + potential role of carbonic anhydrase (K01673) to concentrate CO2)

K04759: ferrous iron transport protein B
More often present in MAGs from brackish MSGs.
Regulation: Cytoplasmatic GTPase domain is believed to regulate transport (141). The in vivo active form is probably the complex with FeoA and FeoC (142). All genes of the Feo system in one operon believed to be regulated by metal availability (143).
Potential mechanism: While the Feo system is present in many marine genomes, it is often present exclusively with other iron uptake systems, especially Fe 3+ transporters (144). FeoB structure has been shown to be especially sensitive to excess salinity (145). Change in salinity can also inhibit Fe-oxidation (or decrease in it can enable acquisition of brackish/freshwater oxidation pathways) (146). It may be that bacteria switch between different iron transporters due to these pressures.
A confounding factor might be the extent of hypoxia in the Baltic Sea, and samples from a bigger depth from this basin, meaning the bacteria in question might have access to more Fe 2+ , which is scarce in surface marine environments.
This gene is also more often present in brackish (and freshwater) picocyanobacterial then in their marine relatives (52).

K00809: deoxyhypusine synthase
More often present in MAGs from brackish MSGs.

Literature names:
Role: Identified in eukaryotes and Archaea, where it is responsible for hypusination of IF-5A. However, bacteria do not have IF-5A. The traces of horizontal gene transfer of deoxyhypusine synthases from Archeae to bacteria has long known (147), yet the role of the transferred genes remains unexplained. A second substrate, outside of the protein, in hypusination is spermidine, a polyamine synthesis of which requires K01480 and K01585.

Regulation:
Potential mechanism: Possibly connects polyamine synthesis to post-translational modifications. Can influence charges on protein surface.

K01585: arginine decarboxylase
More often present in MAGs from brackish MSGs.

Literature names: SpeA
Role: Decarboxylases arginine to agmatine. Thus, the enzyme is directly upstream to K01480 in polyamine synthesis (135) (see K01480 for more on polyamines).

Regulation: Inhibited by cAMP, repressed by putrescine (downstream product of polyamine synthesis) (148).
Potential mechanism: Controlled production of polyamines can be a mechanism of plastic response to changes in salinity, characteristic to brackish waters as opposed to marine environment with relatively stable salt concentrations. May be connected to post-translational modifications through K00809.
Use of urea as nitrogen source has been observed to be gained in specific groups of bacteria transitioning to more N-limited environments with lower salinity (54,56). Thus, the polyamine-producing pathway may also be used in opposite than conventional direction under N deficiency. Reactions and identified genes responsible for them: (1) putrescine + urea ⇌ agmatine + H2O (K01480) (2) agmatine + CO2 ⇌ L-arginine (K01585 + potential role of carbonic anhydrase (K01673) to concentrate CO2)

K03782: catalase−peroxidase
More often present in MAGs from brackish MSGs.

Literature names: katG
Role: Can act as catalase -break up H2O2 to water and oxygen -as well as a peroxidase, an enzyme which uses H2O2 to oxidize other compounds. The latter function is crucial for degradation of many xenobiotics, and thus abundance of this gene in metagenomes has been connected to levels of water contamination (149), especially by polycyclic aromatic hydrocarbons (150). Involved in antibiotic resistance against isoniazid (149).

Regulation:
Expression inducible by stresses inducing production of reactive oxygen, such as heat shock (151).
Potential mechanism: Brackish enclosed basins accumulate higher concentrations of many xenobiotic substances than open oceans and the Baltic Sea has been in recent times reported to have up to 10 times higher concentrations of polycyclic aromatic hydrocarbons than the Northern Sea (150). Thus, in these brackish environments recent pressures on acquisition of genes, rather than transition-related changes, might be the reason behind the difference in gene presence between the MSG pairs.
Could also be involved degradation of organic compounds coming from dissolved organic matter, or produced throughout their degradation. These are generally more diverse and present in higher concentrations in brackish water due to terrestrial input.
This gene is also more often present in brackish (and freshwater) picocyanobacterial then in their marine relatives (52).

Regulation: Expression induced by amino acid starvation (152).
Potential mechanism: Sign of difference in the mobile genetic elements which have spread and/or are being spread in the biomes: the superintegron seems to be more common in the brackish than marine environments. Genome streamlining in oligotrophic marine environments may also play role.

K19159: antitoxin YefM
More often present in MAGs from brackish MSGs.

Regulation: Autorepression (153).
Potential mechanism: Sign of difference in the mobile genetic elements which have spread and/or are being spread in the biomes: the cassette seems to be more common in the brackish than marine environments. Genome streamlining in oligotrophic marine environments may also play role.

K07064: uncharacterized protein
More often present in MAGs from brackish MSGs.

Literature names:
Role: Regulation: Potential mechanism: Uncharacterized protein. Its clustering with the antitoxins could guide characterization.
Regulation: Complex regulatory system involving cyclic-di-GMP, expression induced by cold and starvation (salinity not assessed) (155).
Potential mechanism: Less specific protein modifications or regulated production of intracellular osmolytes as plastic responses to changing salinity levels.
Note on co-annotation: K05844 was the highest-confidence annotation in all cases. There were also genes annotated to K05844 and not to K18310/ K14940, but not the other way round. Thus, we suggest putting more significance on K04100. However, the genes are homologs (75,76) and the actual genes might well be other bacterial paralogs of K04100. Focused phylogenetic studies of the genes in specific MAGs could help to guide further investigations into actual mechanism at play.

Regulation:
Potential mechanism: Probably misanotation due to similarity with K05844 and K14940. Seems that KEGG has different annotation to bacterial, archeal and eukaryotic homologs, since they perform different functions and whether they are orthologs or paralogs might be difficult to distinguish. May suggest that actual gene functions are similar but not the same as for any of the genes, potentially including other pathways and protein modifications involving glutamic acid or glutamate.

Regulation:
Potential mechanism: Possibly misanotation due to similarity with K05844, however horizontal transfer from Archeae cannot be excluded, if the gene can find alternative function in bacterial cells. Seems that KEGG has different annotation to bacterial, archeal and eukaryotic homologs, since they perform different functions and whether they are orthologs or paralogs might be difficult to distinguish. May suggest that actual gene functions are similar but not the same as for any of the genes, supporting hypothesis that modifications of other targets than ribosomal protein S6 can be at play.

K03284: magnesium transporter
More often present in MAGs from brackish MSGs.

Literature names: corA
Role: Passive transport of Mg 2+ , as well as other ions such as Co 2+ , Ni 2+ and Zn 2+ (Stetsenko and Guskov 2020). One of the major and most common bacterial magnesium uptake systems, and opposed to others, can facilitate also Mg 2+ efflux (111).
Potential mechanism: Magnesium is a major component of the sea salt and decreasing salinity may deem need for acquisition for alternative/additional magnesium uptake systems.
Clustering with K03282 suggests that the transporter may play a role in either managing hypoosmotic stress (through ion efflux) or recovery from it (by replenishing the pool of Mg 2+ and other transported cations). Thus, potentially allows adaptive phenotypic plasticity allowing life in different osmolarity/salinity.
This gene is also more often present in brackish (and freshwater) picocyanobacterial then in their marine relatives (52).

K03282: large conductance mechanosensitive channel
More often present in MAGs from brackish MSGs.

Literature names: mscL
Role: Mechanosensitive channel involved in responses to hyperosomotic shock, preventing turgor pressure which would destroy the cell, allowing accommodation to new conditions and further growth (44). It has the highest conductance among Escherichia coli mechanosensitive channels (44), thus is responsible for the strongest response to hypoosomotic stress.

Regulation:
Mechanosensitive channel, activity regulated by tension on the membrane.
Potential mechanism: Directly connected to hypoosmotic stress, possible experience in the course of marine to brackish transitions. Brackish bacteria experience bigger changes in salinity of their immediate environment. Potentially allows adaptive phenotypic plasticity allowing life in different osmolarity/salinity. environmental cues to which the sensor can respond is needed to hypothesize about its importance.

K07343: DNA transformation protein and related proteins
More often present in MAGs from brackish MSGs.

Literature names: tfox
Role: Together with another gene, HapR, it regulates the expression of comEA (K02237), and thus the natural competence (185). It also regulates the expression of type VI secretion system and interbacterial killing (185). It is activated in response to chitin, and it has been suggested to be crucial for the colonization of chitinous surfaces by Vibrio species, at the same time reducing motility through repression of related protein TfoY (185).
Potential mechanism: Two components of the competence system (see also K02237 and K02238), together with this regulator of competence, all are differentially present across the BM MSG pairs. Another of the identified genes, K03630, was originally thought to take part competence but it was later shown to be disposable for the process, though its expression is induced in competent bacteria. Therefore, natural competence is probably more important for adaptation to brackish conditions than results for the single genes suggest. The additional role of K02238 in regulation type IV pili expression may explain a different pattern based on presence/absence changes than for K02237, which is also generally present in less of the MAGs.
Brackish environments are less stable than open oceans and pose pressure for adaptation to wider salinity spectrum. Brackish bacteria might benefit from higher genomic plasticity, as it allows adaptation to disruptions in the ecosystem and colonization of different niches across the salinity gradient (through acquisition of genes shifting optimal/tolerated salt concentrations).
The importance of this gene for colonization of chitinous surfaces may suggest that associating to an animal host may have a role in transitions. Potentially, as animals adapt to a shift in salinity related to formation of a brackish basin, but local bacteria get outcompeted by the global brackish microbiome, association to a host is a survival strategy for the local bacteria. The host-associated bacteria are less likely to come from the other, distant brackish basins, and niches on the host surface open-up.

K03630: DNA repair protein RadC
More often present in MAGs from brackish MSGs.

Literature names: radC
Role: Originally connected to competence and UV-light protection (DNA repair), this connection has been put into question by results on Streptococcus pneumoniae (186). However, it might be that the gene present in Streptococcus pneumoniae, not all the whole orthology group, specifically lacks the UV-protecting properties, which have been shown in Rhodobacter capsulatus (187), ecologically more suspectable to UV-light.

Regulation: It is induced in competent bacteria (186) and in
Potential mechanism: Might be connected to repair of transfer DNA in competent bacteria, though this role has been put into question and there is evidence against it, though limited to Streptococcus pneumoniae (186).
The role in DNA repair in response to UV light is opposite to the light wavelength shift observed in relation to higher dissolved organic carbon levels in lower salinities, as the organic particles scatter short-length light waves. However, clustering with genes responsible for chemotaxis gives a possible explanation. As bacteria move, lead by environmental cues, they may migrate to different higher parts of the water column, exposing themselves to stronger irradiation.

K03413: chemotaxis family, chemotaxis protein CheY
More often present in MAGs from brackish MSGs.

Literature names: CheY
Role: "Diffusible response regulator", key gene for chemotaxis based on flagellar motion (188). CheY proteins can respond to multiple and various environmental cues, even within one bacterial cell (189).
Regulation: Activity regulated in response to environmental cues.
Potential mechanism: Gain/loss of this protein strongly points towards changes in chemotaxis. The diversity of potential cues to which it responds make it hard to speculate about the specific mechanism. However, it is highly probable cheY genes responding to different factors are gained by different bacteria, and its differential presence points towards importance of responses to the gradients of environmental factors, which are characteristic of most brackish environments and usually much more pronounced there than in the open ocean. Potentially promotes adaptive phenotypic plasticity allowing movement across environmental gradients in the brackish biome.
The fact that this gene clusters with weakly characterized sensor histidine kinase (K20974) involved in motility, instead of the literature partner cheA (K03407, not among identified genes), suggests alternative signal transduction pathway to be at play.

K20974: two−component system, sensor histidine kinase
More often present in MAGs from brackish MSGs.

Literature names: Hpt
Role: Histidine kinase, needed for swarming activity and biofilm formation (47) Regulation: As a hisitidine kinase it is involved in transducing signals in the cell, and its activity depends on the proteins with which it interacts.
Potential mechanism: This gene, involved in motility, clusters with CheY (K03413), key chemotaxis protein. It suggests an alternative chemotaxis transduction pathway to be at play, using this kinase instead of usually described cheY partner, that is instead of the literature partner cheA (K03407, not among identified genes). It probably allows bacteria to respond to at least one of the environmental factor gradients, characteristic for brackish environments.